How do we remember the past in randomised strategies? 
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Graph games of infinite length are a natural model for open reactive processes: one player represents 
the controller, trying to ensure a given specification, and the other represents a hostile environment. 
The evolution of the system depends on the decisions of both players, supplemented by chance. 

In this work, we focus on the notion of randomised strategy. More specifically, we show that 
three natural definitions may lead to very different results: in the most general cases, an almost- 
surely winning situation may become almost-surely losing if the player is only allowed to use a 
weaker notion of strategy. In more reasonable settings, translations exist, but they require infinite 
memory, even in simple cases. Finally, some traditional problems becomes undecidable for the 
strongest type of strategies. 



"You can't have a strategy against telepaths: you have to act randomly. You have to not know what 
you 're going to do next. You have to shut your eyes and run blindly. The problem is: how can you 
randomise your strategy, yet move purposefully towards your goal?" 

Solar Lottery 
Philip K. Dick 



1 Introduction 

Since their introduction to verification in the late eighties, graph games have emerged as the model of 
choice for problems about open systems, where a controller (Eve) must interact with an a priori hostile 
environment (Adam) [PR89]. In such games, an arena — i.e. a graph — models the system and its 
evolution: at the beginning, a token is laid on one of the vertices, and its moves are determined by the 
actions of the players, supplemented by chance. The infinite sequence of vertices that ensues constitutes 
a play of the game, whose winner is defined by some predetermined specification, often given as a regular 
condition on infinite words [MP92 ]. 

This model has been declined in a multiplicity of variants, in terms of both arenas and objectives. 
However, the questions are nearly always the same: Is there a winning strategy? For which player? How 
complex is it, in terms of memory and randomisation? Memory is the quantity of information that one 
is allowed to remember from the past: in general, the whole history is available, but it is often enough to 
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remember a finite quantity of information. In addition, a strategy is pure if it proposes only one possible 
action after any given sequence of observations. The notion of randomised strategy, and its relation to 
memory, is the subject of this paper. 

In verification, "randomised strategy" usually refers to a function from the history to a probability 
distribution over the actions. In other domains, such strategies are called "behavioural strategies", as 
opposed to two other models of randomised strategies: a mixed strategy is a measure over pure strate- 
gies, and a general strategy is a measure over behavioural strategies. These models are also relevant in 
computer science. Indeed, the IPv6 "Stateless Address Autoconfiguration" protocol, which only uses 
randomisation at the beginning to generate a new LP. address [TNJ07 ], can be accurately described as a 
mixed strategy. Likewise, the secure shell protocol (ssh) is a general strategy, since a new session key 
has to be randoml^Q generated every hour or gigabyte MYL0611 . 

In this paper, we propose definitions for mixed and general strategies, with or without memory, in the 
framework of graph games for verification. We expose several situations in which their analyses differ 
significantly from the behavioural model. In the most general case, the same game can be almost-surely 
losing or almost-surely winning depending on the type of strategies we consider. In other situations, 
we conjecture that the values are the same, but we show that memory needs vary (from two to infinity). 
Altogether, we hope to ask more questions than we give answers: our main objective is to describe these 
three models for randomised strategies in graph games and to point out that many problems which are 
solved for behavioural strategies are still open in the mixed and general cases. 

The paper is organised as follows. In Sectional we recall the classical notions about graph games 
in verification, in a very general framework which subsumes a large part of the literature. Section [3] 
presents our definitions for behavioural, mixed, and general strategies in graph games, and stresses the 
fundamental differences between the three notions. Section [4] focuses on memory -related issues: it 
exhibits variations in the elementary cases of concurrent safety games and simple Muller games. In 
Section [51 we sum up our observations and results, and propose some open problems. 

2 Definitions 

Notation 

For a finite or countable set 5?, we denote by @l(S?) the set of probability distributions over 5? , i.e. the 
set of functions from to positive real numbers that sums up to one. 

Arenas and plays 

An arena is a tuple (¥, 3£ , & , 7,0, <p,*P, where ¥ is the set of vertices in the graph, 3£ is 
the set of actions of Eve, <3/ is the set of actions of Adam, 5:f xfxf 3>(¥) is the transition 
function, T is the set of colours, 7 : ¥ — 1 Y is the colouring function, <I> is the set of signals of Eve, 
(p : ¥ U X — 1 <!> is her observation function, *P is the set of signals of Adam, and y : ¥ U W — 1 *P is his 
observation function. Many results about graph games for verification consider only restricted arenas, 
such as: 

Synchronous: an arena is synchronous if (p and y are total. 

Observable actions: a synchronous arena has observable actions if the restrictions of (p to SC and \ff 
to W are one-to-one. 



'except on Debian [BB08|. 
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Perfect information: a synchronous arena has perfect information (or is concurrent) if the restrictions 
of (p and y to 'f are one-to-one. 

Simple: a concurrent arena is simple (or turn-based) if for each vertex q, 8(q,x,y) depends either on x 
or on y, but not both. 

A play on the arena srf is a (possibly infinite) sequence tt = 7To^i • • • of states such that V/ < |tf|— 1,3x G 
=ST,;y € W,8(ni,x,y)(7ii + i) > 0. The set of plays is usually denoted £1, and the set of plays starting with 
the vertex q by Q. q . 

Pure strategies and measures 

A pure strategy a for Eve (resp. t for Adam) on the arena si associates an action to each finite sequence 
of observations: a : <J>* — >■ (resp. t : *F* — )■ $0 . A play % is consistent with a strategy a for Eve 
(resp. T/or Adam) if and only if at each step i, there is an action ySSf for Adam (resp. x G JT) such 
that 5(^,-,a(<p(^i.. i -)),j)(7r,- +1 ) > (resp. 5(7r ; -,x,T(^(7r Lj -)))(7r,- +1 ) > 0). Notice that, in the case of an 
asynchronous arena, actions can only change with new observations: otherwise, the same argument leads 
to the same result over and over. The set of plays consistent with a (resp. t; a and t) is denoted by 
Q. a (resp. £2 T ; £2 CTT ). Once an initial vertex q and two strategies a and z have been fixed, Q.q' z can 
naturally be made into a measurable space (£2^' T ,£?), where G is the a-field generated by the cones 
{ff w | w G "V*\. % G W if and only if w is a prefix of %. The probability measure P^' T is recursively 
defined by P£' T (^) = 1 and: 

Vw G r*,M G V 2 ^\e wrs ) =¥^(ff wr ) ■8(r,a(cp(wr)),r(¥(wr)))(s). 

Caratheodory's extension theorem allows us to extend P^' T to the Borel sets of (£2^' T , ff) ||Wil91|| . 

Winning conditions and values 

A winning condition W on a set of colours T is a Borel subset of r°°. A play % in an arena stf on 
r is winning for Eve in the game (stf ,F) if y(n) G W , and winning for Adam otherwise. In a game 
^ = (gf,W), the pure value of a state q under the strategies o and T, denoted v as {q), is the measure 
of W under P^' T . The value for Eve of a state q is the supremum of the values that she can ensure from 
q against any strategy of Adam. Symmetrically, the value for Adam of a state q is the infimum of the 
values he can defend against any strategy of Eve. In simple stochastic games, these two values coincide 
and are usually called the value of q |Mar98 ( MS98 n. 

\(q) = supv CT (g) = infv T (g) . 

Winning criteria 

Following de Alfaro and Henzinger [dAHOO], we consider several notions of winning strategies and 
winning regions, depending on the chances Eve has to win. In decreasing order of difficulty, and from an 
initial vertex q, a strategy a for Eve: 

• is sure if any play consistent with a is winning for Eve; 



'As a matter of fact, these papers shows the quantitative determinacy, in behavioural strategies, of Borel games on concurrent 
arenas. An inspection of the proof yields the same result for pure strategies in the case of simple arenas. 



J. Cristau, C. David & F. Horn 



33 



• is almost-sure if for any strategy x for Adam, v a ^(q) = 1; 

• ensures £ if for any strategy x for Adam, v a ^{q) > e; 

• is positive if for any strategy x for Adam, Vf7, T (<7) > 0; 

• is heroic if for any strategy X, there is a play consistent with a and x which is winning for Eve. 

The sure region (resp. almost-sure region of Eve, positive region, heroic region) of Eve is the set of ver- 
tices from which she has a sure (resp. almost-sure, positive, heroic) strategy. Furthermore, the bounded 
region is the set of vertices from which Eve has a strategy ensuring a positive e and the limit-one region 
is the set of vertices from which Eve has a strategy ensuring e for any e < 1. The same concepts are 
defined accordingly for Adam, except that we say that a strategy x for Adam defends £ if it guarantees 
that, for any strategy a for Eve, v a r (q) < e. 

3 Behavioural, mixed, and general strategies 

As soon as we deal with concurrent arenas, we cannot rely only on pure strategies to make meaningful 
analyses. In the classical game of "Janken", any pure strategy is surely beaten by the appropriate counter- 
strategy (paper against rock, scissors against paper, and rock against paper), but a strategy which 
plays each action with probability ^ eventually wins with probability ^- The main point of this paper is 
that there are several possible definitions for the notion of "randomized strategy". 

• A behavioural strategy returns at each step a distribution over the actions: <£* — > *3s{X\, 

• a mixed strategy is a measure over pure strategies: £^(<1>* — > 3C\, 

• a general strategy is a measure over behavioural strategies: £P(<J>* — > £P( JT)). 

As we show in this paper the expressive powers of these models are quite different. Intuitively, a be- 
havioural strategy does not know in advance what it will play next, so its actions can change when its 
decisions do not (even when there are no observations). Mixed strategies use randomization to get hidden 
information at the beginning of the play, which can later be used to correlate undistinguishable actions, 
e.g. playing aa or bb with probability ^. General strategies subsume both, so they can, in particular, 
generate hidden information on the fly. These distinctions have mostly been overlooked in verification 
(apart from a few remarks, e.g. [dAHK98 ( DHR08]). One reason is that the games we consider are 
usually synchronous, with observable actions. On synchronous arenas, mixed strategies can simulate 
behavioural strategies: as each action can be uniquely identified beforehand by its position in the play, it 
is possible to define a measure which somehow makes all the random draws at the beginning of the play. 
If furthermore, the actions are observable, Kuhn's theorem states that mixed and behavioural (and thus 
general) strategies have the same expressive power [Aum64]. 

These hypotheses have been inconspicuously challenged in recent papers. In this regard, the compar- 
ison between [BGG09] and [GS09] makes for an enlightening example. At first glance, these two papers 
look very similar: they both ponder the problem of the existence of almost-sure strategies in games 
where both players have (asymmetric) imperfect information. A closer examination reveals the differ- 
ences: Bertrand, Genest, and Gimbert use general strategies, while Gripon and Serre use behavioural 
strategies; furthermore, in the latter paper, the players cannot observe their own actions. As a conse- 
quence, there are cases where the answer to the synthesis problem depends on which model is used. 
Consider for example the synchronous arena depicted in Figure [T] where Eve cannot distinguish vertices 
nor actions in the dashed area, (g> is a losing sink state, and © is her "target", for either a reachability or 
a Biichi condition. 
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Figure 1: Who wins? 



With a behavioural strategy, Eve's strategy can only depend on the length of the play. At any even 
move, if her strategy is to play a with probability p and b with probability 1 — p, Adam can answer by 
playing A with probability l—p and B with probability p, so the odds of the token going to a or to /3 
are equal (they are worth p ■ (1 — p) each). In the next step, no matter what a advocates, the odds of the 
token going to © or to (g> will again be equal. In the reachability game, this limits Eve's prospects to half 
chances. In the Biichi game, the probability that she wins drops to 0. 

On the flip side, she has an almost-sure mixed strategy for both objectives: the natural "uniform" 
measure over the strategies of the form (aa\bb) a guarantees that each sequence of two moves starting in 
the initial vertex has a probability of | to send the token to ©, and a probability of ^ to send the token 
back to the initial vertex, no matter what Adam does. It cannot go to <g>, as Eve never plays ab or ba from 
the initial vertex. 

The arena of Figure Q] is synchronous, so any behavioural strategy can be emulated by a mixed one. 
If we remove this hypothesis, it is not always the case, as in the one-player game of Figure [2l where Eve 
is unaware of any action or vertex. 



As Eve observes nothing, her strategy is completely determined by what she does on the empty 
word A. She has only two pure strategies: A — > stay and A — > leave. Both lead to ©, and so does 
any mixed strategy of the form {p ■ (A — >■ stay),(l — p) ■ (A — > leave)}. The behavioural strategy 
A — > • stay, \ ■ leave}, on the other hand, yields one chance out of four to to reach ©. 

The case of games with perfect information and invisible actions is still open: there are mixed strate- 
gies which cannot be imitated by any behavioural one, so we cannot hope for a "generic" translation. But 




leave 



Figure 2: The D.U.I, game 
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that does not rule out the possibility of specific, objective -dependent constructions which would yield a 
different strategy with the same value. 

4 Memory issues 

A refinement of the synthesis problem asks that the controller uses only finite memory, as a natural 
requirement for implementability. Pure strategies with memory are defined in the following way: 

Definition 1 A pure strategy a with memory M is a triple (a 1 , (7 U , a a ) where a 1 € M is the initial 
memory state; <J U : (<J> x At) —¥ M is the memory update function, which maps a signal and a memory 
state to a new memory state and is called at each new observation of Eve; and a a : (<I> x M) — > X is the 
next- action function, which maps a signal and a memory state to an action and is called at each step. 

Notice that any pure strategy a can be represented as a strategy with memory <!>*, with a 1 = A, a u = • 
and a a = a. A strategy has finite memory if M is a finite set, and is memoryless if M is a singleton. 

Randomized strategies with (countable) memory are defined with similar tuples, except that some of 
their elements use randomization. 

Behavioural: In a behavioural strategy with memory M, the next-action a a : ($xM)->f(5) is 
randomized. 

Mixed: In a mixed strategy with memory M, the initial memory a 1 G £F(M) is randomised. 

General: In a general strategy with memory M, the next-action, initial memory, and memory-update 
a u :($xM)-> 3l{M) are randomised. 

The memory requirements can also depend of the type of strategy. In the game of Figure [T] for 
example, there is no almost-sure mixed strategy with finite memory (in the reachability game, there are 
£-optimal strategies with finite (unbounded) memory; in the Biichi game, every mixed strategy with finite 
memory has value 0). However, the strategy we described can be realized by a general strategy with four 
memory states a even , a odd , b even and b odd : in the even memory state, she updates her memory at random 
to one of the odd states; in the odd states, she updates her memory to the corresponding even state; in 
all states, she plays the action corresponding to her memory state. 

4.1 Concurrent safety games 

In [dAHK98], de Alfaro, Henzinger, and Kupferman study the problem of concurrent reachability/safety 
games and establish the qualitative determinacy of these games, as well as several results on the nature 
(memory and randomization) of the strategies needed to achieve various objectives. In particular, they 
show that positive strategies for safety objectives require, in general, an infinite amount of memory. The 
proof is based on the famous "snowball game" of BKS81I . which is pictured in Figure [3] 

In this game, Adam loses if he never runs and Eve never throws, or if Eve happens to throw the 
snowball exactly at the moment he runs. It is clear that Adam has memoryless behavioural strategies 
with value arbitrarily close to one: if, at each step, he chooses to run with probability e, he ensures a 
probability of winning of 1 — e (Eve's best chance is to throw the ball right away). It is also clear that he 
cannot win almost-surely: if he has a positive probability of never running, Eve can keep the snowball 
forever; and if he has a positive probability of running at any step, Eve can thwart him by throwing the 
ball with probability A at each step. 
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Figure 3: The snowball game 

By the qualitative determinacy of concurrent regular games, Eve has a positive strategy, i.e. a unique 
strategy which prevents Adam from winning almost-surely with any strategy. De Alfaro, Henzinger, 
and Kupferman use behavioural strategies, and argue that Eve needs infinite memory: the sequence 
(c(O') (throw) ),- e N must go to but never reach it. It is clear that there are no positive mixed strategies 
with finite memory, as pure strategies with n memory states can only throw the snowball in the first n 
steps. 

On the other hand, there is a general strategy with only 2 memory states: in the memory state Never, 
Eve keeps the snowball with probability 1, in the state Eventually, she throws it with probability i; the 
memory never changes, and the initial memory state is chosen at random. This strategy prevents Adam 
from winning almost-surely, since he can never be sure that Eve is not in the memory state Eventually. 
In fact, this is the case in every finite concurrent safety game: 

Theorem 2 In every finite concurrent safety game, Eve has a positive general strategy with memory 1 
from her positive region. 

Sketch of proof. It follows from the analysis of the fix-points in [dAHK98] that there is a total preorder -< 
on the vertices such that: 

• the minimal vertices belong to the almost-sure region of Adam; 

• for each non-minimal vertex q G "V , there is an action saf e q S X of Eve such that, for any action 
y of Adam: 

- either for any vertex rG'f, 8(q, saf e q ,y)(r) > =>■ q ^ r, 

- or there is an action x G 3C of Eve and a vertex rgf such that 8 (q,x,y) (r) > A q ^ r 

Notice that always playing the safe action is a pure and positional sure strategy for Eve in the maximal 
vertices (unless they are also minimal). For the vertices in between, we claim that the following strategy 
with two memory states is positive for Eve: 

• the memory states are called Sound and Chance; 

• each time the token goes to a new -<! -class, the memory state is updated to either Sound or Chance 
with equal probabilities; otherwise, the memory does not change; 

• in the Sound state, Eve always plays the safe action of the current vertex; 

• in the Chance state, Eve plays any action in S£ with equal probabilities. 

The situation is roughly the same as in the snowball game: if Adam's actions have no chance to go to a 
lower vertex against the Sound strategy, he will lose with probability ^; if he takes a risk at any point, 
there is a positive probability that Eve was in the Chance memory state all along, so he could end up in 
a greater vertex. □ 
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In addition to the finite memory, the strategy described in the proof of Theorem |2]is simple, generic, 
and uses only uniform probabilities. By comparison, the description of a positive behavioural strategy is 
in general very complex and uses probabilities of unbounded precision. 

4.2 Memory bounds for Muller games 

Even in the elementary case of simple Muller games, it is not clear that the memory needs are the same 
for behavioural and general strategies. Recall that a simple arena is an arena with turn-based moves and 
perfect information for both players, and a Muller condition is a condition depending only on the set of 
colours visited infinitely often: 

Definition 3 A Muller condition on a set of colours T is specified by a subset ^ of ^(T). A play % 
satisfies Muller(J^) if and only if the set of colours occurring infinitely often in y{k) belongs to ^ . 

In such games, both players have pure optimal strategies with finite memory [BL69]. A follow-up 
problem is to determine, for a given Muller condition J'ona set of colours T, the necessary and sufficient 
amount of memory needed to define optimal pure strategies in any arena coloured by T. Gurevich and 
Harrington used the latest appearance record (LAR) structure of McNaughton to give a first upper bound 
of |r|! |GH82|. Zielonka refined the LAR into a tree, whose leaves could be used as memory [Zie98 |. 
Finally, Dziembowski, Jurdzinski, and Walukiewicz showed that each player needs only as much mem- 
ory as the number of leaves in some particular sub-trees, establishing tight and asymmetrical bounds for 
pure strategies MDJW97II . 

It is clear from their proof that mixing strategies does not help, since the other player can efficiently 
adapt their strategy in the witness arenas. This is not the case for behavioural strategies: Chatterjee, 
de Alfaro, and Henzinger observed that upward-closed winning conditions admitted memoryless strate- 
gies IICdA H041. leading to smaller upper bounds for arbitrary Muller conditions [Cha07|. Horn estab- 
lished even smaller tight bounds for general strategies [Hor09 ] (see Figure|4]for a graphical representation 
of the three bounds on a Zielonka (sub-)tree). However, Horn's upper-bound has not yet been proven (or 
refuted) for behavioural strategies. 




(a) Pure/mixed (tight) (b) Behavioural (upper) (c) General (tight) 



Figure 4: Memory bounds for simple Muller games 



38 



Remember the Past in Randomised Strategies 



5 Discussion 

We have compared three models of randomized strategies — behavioural, mixed, and general — in the 
context of graph games. Depending on the sub-case, we were able to expose variations in the amount of 
memory needed, the existence of finite-memory strategies, or even the values. In concurrent games with 
unobservable actions, the equivalence between the three models is still an open question. 

In verification, the behavioural model has received most of the attention. Nevertheless, there is a 
priori nothing wrong with the other types of controllers. Furthermore, in several cases, general strategies 
can be much simpler than behavioural or mixed ones. On the other hand, general strategies are much less 
amenable to further analysis, as they introduce imperfect information. Even in simple safety games, one 
cannot compute the value of a general strategy — or even decide if it has positive value MGQ09II . 

Each model has strengths and weaknesses, and we do not favour one over the others. Our point is 
rather to stress the importance of this initial choice, and to note that many memory-related problems 
which have been solved for behavioural strategies are still open in the mixed and general frameworks. 
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